CHAPTER 5 


Joins, Temporary 
Tables, and 
Transactions 


In this chapter, we’ll discuss three additional features you can use to speed up 
your MySQL applications. While these aren’t direcdy related to one another, each 
represents an opportunity to decrease the amount of database or code overhead 
required to perform useful tasks with MySQL by combining queries or opera¬ 
tions on the code level into fewer units that perform more work. 

• Joins allow for the selection of data from multiple tables using a single SQL 
statement. 

• Temporary tables provide a way to organize data derived from queries into 
new tables, which can themselves be queried repeatedly over the lifetime 
of a MySQL user session. 

• Transactions allow you to group together related operations into logical 
units in such a way that all operations either succeed or fail together. 

We’ll spend some time with each of these features, discussing what it is, how 
it works, and how you can put it to use in your applications. 

The rationale behind joins is relatively simple: it’s more efficient to issue a 
single query than to use a series of them, with the resultset from the first query 
providing the conditions for one or more additional queries. There are several 
types of joins, which are distinguished chiefly by how they treat values in one 
table column that aren’t matched in the related column of the other table; we’ll 
cover each of these in turn. In addition, we’ll discuss the two basic styles for join 
notation (theta-style and ANSI-style) and the variations on these that are avail¬ 
able in MySQL. 

The use of temporary tables is another way to save time and effort, particu¬ 
larly when dealing with several queries that return very large and similar 
resultsets. When you find yourself dealing with the same subset of table data sev¬ 
eral times in a single session, it’s often faster and more economical to obtain it 
once and store it in a temporary table, rather than either saving the data in a 
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programming structure (such as an array or hash) or repeating a complex join 
several times. If you’re using several resultsets that contain a large proportion of 
data in common, it can also make sense to obtain a single resultset that has all 
the data that’s required, store this in a temporary table, and then select from this 
temporary table as needed. Temporary tables are very convenient to use in 
MySQL because they are unique to the user session in which they were created. 
We’ll explain just what this means, as well as how to use temporary tables. 

Transactions are beneficial because they make it much easier to guarantee 
data integrity. It’s also much more efficient to use transactions than to attempt to 
perform each query separately in your application logic, testing for its success or 
failure, and then undoing any previously successful operations in the event that 
one does fail. By using InnoDB or Berkeley DB (BDB) tables and transactions, 
you can let MySQL handle this task for you. Using transactions is not necessarily 
faster in and of itself than not using them; in fact, MylSAM tables (which don’t 
support transactions) are faster than either InnoDB or BDB tables (which do). 
However, you’ll almost certainly save time in development, and your applica¬ 
tions will require less code, because you don’t need to test and possibly undo 
each query individually. In this chapter, we’ll cover the basic theory of transac¬ 
tions and how they’re implemented in MySQL. Later in this book (in Chapter 7), 
you’ll see how these are used in PHP, Python, and Perl. 



Joins 


A join in MySQL or any other relational database is simply the selection of data 
from two or more related tables in a single query based on column values com¬ 
mon to all of those tables. The cardinal rule for relating tables can be stated as 
follows: Tables to be joined must have one or more columns sharing a set of values 
that allow those tables to be connected in some meaningful way. 

In other words, if we think of tables as modeling real objects, then joins are 
simply a way of relating objects according to the attributes they hold in com¬ 
mon. The column held in common by both tables is usually referred to as the 
common key or join key. Of course, it’s possible to have more than one common 
column between two tables, and so it’s possible to use more than one join key in 
any particular join. Most often, the join key will be the primary key of one table 
and a foreign key in the other. 

Before going any further, let’s provide a scenario that we’ll employ for 
generating some examples in the rest of this section. This represents a slight 
modification of the students/classes schema used in Chapter 3. This updated 
schema is shown in Figure 5-1. 
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Figure 5-1. Updated students!classes schema 

This schema consists of six tables from a database used for tracking students’ 
schedules and grades at a small college. Four of these characterize students, 
instructors, classes, and courses of study. (Obviously, this is greatly simplified for 
illustrative purposes.) We also have two lookup tables (students_classes and 
students_courses) linking students with classes and students with courses of 
instruction. In a real-world application, we might do some things differently, but 
we hope you’ll be able to overlook that for the time being. 


NOTE The students-tables.sql file is included in the ch5 folder of 
the code accompanying this book (available from the Downloads 
section of http://www.apress.comj. 2 yso in that folder you’ll find 
the students-data.sql file, which contains the SQL statements for 
inserting the test data we’ll be referring to in our examples. 



The following is the SQL code for generating the required tables: 

CREATE TABLE classes ( 

class_id int(ll) NOT NULL auto_increment, 
instructor^ int(ll) NOT NULL default 'O', 
name varchar(50) NOT NULL default ", 
hours int(l) NOT NULL default 'O', 

PRIMARY KEY (classid) 

>S 


4 


-o- 
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CREATE TABLE courses ( 

courseid int(ll) NOT NULL autoincrement, 
name varchar(SO) NOT NULL default ", 
PRIMARY KEY (courseid) 

); 


CREATE TABLE instructors ( 

instructor^ int(ll) NOT NULL auto_increment, 
course_id int(ll) NOT NULL default 'O', 
firstname varchar(50) NOT NULL default ", 
lastname varchar(50) NOT NULL default ", 
PRIMARY KEY (instructor_id) 

); 


CREATE TABLE students ( 

student_id int(ll) NOT NULL auto_increment, 
firstname varchar(50) NOT NULL default ", 
lastname varchar(50) NOT NULL default ", 
sex enum('M','F') NOT NULL default 'M', 
dob date NOT NULL default '0000-00-00', 

PRIMARY KEY (studentid) 

); 

CREATE TABLE students_classes ( 

student_id int(ll) NOT NULL default 'O', 
class_id int(ll) NOT NULL default 'O', 

semester enum('FALL','SPRING','SUMMER') NOT NULL default 'FALL', 
year int(4) NOT NULL default '2005', 
grade int(l) default NULL, 

PRIMARY KEY (student_id,class_id,semester,year) 

k 

# Note: For the grade column, we assume that the US system is being used: 

# A = 4, B = 3, C = 2, D = 1 , F = 0; for our purposes we'll assume that 

# a value of NULL represents incomplete status (class in progress, etc.) 

CREATE TABLE students_courses ( 

student_id int(ll) NOT NULL default 'O', 
courseid int(ll) NOT NULL default 'O', 
type enum('MA10R','MINOR') NOT NULL default 'MAJOR', 

PRIMARY KEY (student_id,course_id) 

); 
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NOTE We’ve constructed this in such a way that students may have 
double majors and/or minors. Limiting students to no more than 
two of each would need to be done in the application, as MySQL 
doesn’t yet support triggers; we’ll discuss this further in Chapter 8. 


While we’ve shown the foreign key relationships in Figure 5-1, we have not 
bothered to include them in the table definition statements. However, you 
should keep them mind, since join keys at least imply a foreign key relationship 
between the tables being joined, even if it’s not made mandatory through the 
use of constraints. 

As for the use of joins, consider the following problem: Suppose we want to 
know the name of the course area in which a given instructor teaches classes. We 
could do this by using two separate queries. First, we get the course area number 
from that instructor’s record in the instructors table, and then we plug that 
number into the courses table to obtain the name of the corresponding course: 

SELECT @cnum := course_id FROM instructors 

WHERE firstname = 'Mary' AND lastname = 'Williams'; 

SELECT name FROM courses WHERE courseid = @cnum; 

Notice that we employ a user variable in order to preserve the result of the 
first query and make it available to the second. As you learned in Chapter 4, this 
frees us from the need to create, set, and refer to an additional application vari¬ 
able in programming code. 

Here’s what happens when we run these two queries from the MySQL com¬ 
mand line: 



Since the course number (course_id column) is common to both tables, we 
can write a single query joining both tables using this as the common key for the 
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join. We merely combine any conditions required by each of the original two 
queries and set the columns common to both tables equal to one another: 

SELECT c.name 

FROM instructors i, courses c 

WHERE i.firstname = 'Mary' AND i.lastname = 'Williams' 

AND i.courseid = c.course_id; 

Here’s the result: 



This result is the same as that obtained by using the two previous queries in 
succession. 




Theta-Style Joins vs. ANSI-Style Joins 


There are two accepted styles for writing joins, known as theta-style joins and 
ANSI-style joins. Perhaps the best way to explain the difference is to show an 
example. Let’s suppose we want a listing of all instructors that shows the names 
of the courses of study for which they teach classes. Since the names of the 
instructors are in one table (instructors) and those of the courses are in another 
(courses), we’ll need to execute a join on these two tables in order to obtain the 
desired set of data. 

Theta-style join syntax uses commas to separate multiple table names and 
aliases, just as in the previous example: 

SELECT c.name 

FROM instructors i, courses c 

WHERE i.firstname = 'Mary' AND i.lastname = 'Williams' 

AND i.course_id = c.course_id; 

ANSI syntax uses the 10IN and ON keywords instead: 

SELECT c.name 

FROM instructors i 10IN courses c 

ON i.course_id = c.course_id 

WHERE i.firstname = 'Mary' AND i.lastname = 'Williams'; 
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The JOIN keyword is used to separate the names of the tables being joined, 
and the ON clause contains the equality relation showing which column is being 
used as the join key. Both varieties of join syntax are permissible in MySQL; how¬ 
ever, the ANSI syntax is generally preferable because it’s usually easier to read 
and understand, particularly when writing joins involving three or more tables. 
There are also some types of joins that can’t be written using theta-style notation 
in MySQL, as you’ll see shortly. 

In addition, MySQL supports a nonstandard extension of the ANSI syntax 
that can be used as a sort of shorthand for when the join column has the same 
name in both tables to be joined: 

SELECT c.name 

FROM instructors i JOIN courses c 
USING (course_id) 

WEIERE i.firstname = 'Mary' AND i.lastname = 'Williams'; 

This has the same result as our earlier example. 



The USING keyword is not supported in other database systems; however, if 
portability isn’t an issue, it can be handy for eliminating a bit of typing, as well as 
for conceptualization purposes. 


Join Types 

When joining two tables together, MySQL can handle rows that are or aren’t 
matched in one or both tables in several different ways. We’ll look briefly at each 
of these in turn. 


Cross Join 

Each row from the first table in a cross join is joined to all rows from the second. 
Also known as the Cartesian product of two tables, this type of join yields 









extremely large resultsets, the size of the resultset being the product of the 
number of rows in each table. Here is an example of a cross join written using 
theta-style notation: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i, courses c; 

Using ANSI-style notation, we would write this as follows: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
JOIN courses c; 

or like this: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
CROSS JOIN courses c; 


The two ANSI-style forms are equivalent in MySQL. 

The reason for this multiplication might be more apparent if you visualize a 
cross join as shown in Figure 5-2. Very simply, every row in the left-hand table of 
the join is matched to every row in the table on the right. For the sake of clarity, 
we’ve indicated only the matches on the first two rows of the instructors table, but 
you should be able to extrapolate from this and see that there will be 6 X 13 = 78 
rows in the resultset. (Don’t worry that we’re asking you to take this as merely an 
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Figure 5-2. A cross join matches every row in the tables. 
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article of faith; we’ll offer proof of a more concrete sort very shortly.) Note that the 
last row in instructors, for which there are no records in the courses table having 
the same value in the course_id column, is still matched against every row in the 
right-hand table. 

Assuming that we’re using the data supplied in students-data.sql, the result- 
set produced by this query (written in any of the three ways shown) would 
contain 78 rows (for 13 instructors and 6 course areas). Cross joins are very inef¬ 
ficient due to the sheer size of their resultsets and to the fact that, given a equal 
to the number of rows in the first table and b equal to the number of rows in the 
second, the proportion of redundant data in the result will be: 

aib- 1 ) = 1 1 

ba b 

In the example shown, approximately 92% (12/13) of the data returned is 
repetitive and therefore useless to us. 



NOTE If a join condition is not specified for any other type of join 
(except a natural join), MySQL will treat it as a cross join. This is true 
for most other databases as well. 


Inner Join 


An inner join is defined as a join in which unmatched rows from either table are 
not to be returned. 

Writing inner joins using the theta-style notation is just a matter of adding 
an appropriate WHERE clause that relates the columns comprising the join key: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i, courses c 
WHERE i.course_id = c.courseid; 

This join will produce a list of all instructors with the names of the course areas 
in which they teach. 

To accomplish the same thing in an ANSI-style join, use an ON or USING clause 
to define the join key: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
DOIN courses c 
USING (course_id); 


4 
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The USING keyword is specific to MySQL; the ANSI-standard equivalent to 
this join is as follows: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
JOIN courses c 

ON i.course_id = c.course_id; 

This query is illustrated in Figure 5-3, which shows how just a few of the 
rows on the left correspond to rows in the table on the right. The last row in the 
instructors table has a course_id value of 0; since there are no rows in courses 
with that value in the course_id column, the row from instructors isn’t included 
in the resultset. This is indicated by the X over the arrow in the diagram in 
Figure 5-3. 
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Figure 5-3. An inner join does not return unmatched rows. 


CAUTION You should never place restrictions on the rows to be 
returned in a join’s ON clause; only join conditions of the form 
tl.coll = t2.col2 (where tl and t2 are table aliases) should be 
placed here. Any restrictions intended to filter the resultset should 
be placed in a WHERE clause. 
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You may use the optional INNER keyword as well. However, you should note 
that using this does not by itself make your query into an inner join; in fact, 
without an ON or USING clause, MySQL will still treat the query as a cross join. 
Compare the following two queries. 



Unless there’s an actual need to find rows in one table that aren’t matched in 
another (and sometimes there can be, as you’ll see shortly), inner joins are gen¬ 
erally the most efficient joins to use. There’s no point in returning records you 
don’t need, and you’ll save time and effort by not being required to filter out NULL 
rows from your results. 


NOTE If a join condition but no join type is specified in an ANSI- 
style join, MySQL will treat the join as an inner join, just as SQL 
Server and PostgreSQL will. 



Left (Outer) Join 

Outer joins differ from inner joins in that outer joins will return records in one 
table that aren't matched in another. In a left outer join (or more simply, left- 
hand join or even just left join), all records from the first (left-hand) table in a 
join that meet any conditions set in the WHERE clause are returned, whether or 
not there’s a match in the second (right-hand) table. 

For example, let’s suppose we would like a list of all instructors whose last 
name begins with the letter A, along with any classes that they teach. If we want 
a list including only those instructors who actually teach any classes, we use an 
inner join: 












As previously mentioned, the INNER keyword is optional. We could also use 
theta-style syntax for this query: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors I, classes c 
WHERE i.instructorid = c.instructor_id 
AND i.lastname LIKE 'A%'; 

Using a left join, we can obtain a list of all instructors whose last name 
begins with A, whether or not there are any matching entries for those instruc¬ 
tors in the classes table. As you can see from Figure 5-4, there are three 
instructors whose last names begin with the letter A: William Anderson’s instruc¬ 
tor ID matches that listed for three classes, and the instructor ID for Frederick 
Albright is the same as that of four classes. Janet Allen’s instructor ID doesn’t 
match with that for any classes at all; since this is a left join, we show an arrow 
pointing from her record in the instructors table to the word NULL. 

The query and its result are as follows. 



There are no matching class records for the instructor named Janet Allen, so 
MySQL dutifully returns a row containing her first and last names in the corre¬ 
sponding columns from instructors and a NULL value for the name column that 
was requested from the classes table. 
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Figure 5-4. A left (outer j join returns all records from the left-hand table that 
meet any conditions in the WHERE clause, whether or not there’s a match in the 
right-hand table. 


Let’s consider for a minute what happens when we encounter this NULL value 
in an application. We probably wouldn’t want to display the word NULL, since 
it’s not very descriptive. Instead, we would likely prefer something a bit more 
user-friendly, along the lines of “No classes assigned.” Rather than test for the 
NULL value in our application code and make a suitable substitution there, we 
can use a flow-control operator to accomplish the same thing in the join itself. 
While we’re at it, let’s reduce the number of columns in the output to two by 
using the CONCATQ function on the instructor’s first and last names to form a 
single instructor column. We’ll also include an ORDER BY clause in the query to 
alphabetize the results by the instructor’s last name. 


































































3324CH05.qxd 9/21/04 9:53 AM Page 252 


Chapter 5 


SELECT 

CONCAT(i.firstname, ' ', i.lastname) AS instructor, 

IFNULL(c.name, '[Not currently assigned to any classes.]') AS class 
FROM instructors i 
LEFT I0IN classes c 
USING(instructor_id) 

WHERE i.lastname LIKE 'A%' 

ORDER BY i.lastname; 

The result looks like this in the MySQL Monitor: 



Generally speaking, you can employ any operators, built-in functions, 
and flow-control operators in multiple-table joins that you could use in SELECT 
queries from a single table. 


NOTE MySQL does not support Oracle-style (+) = or = (+) theta 
syntax for outer joins. If you need to specify a left join, you must use 
ANSI syntax with JOIN or LEFT OUTER 10IN. The same is true 

with respect to right joins: use RIGHT JOIN or RIGHT OUTER 10IN. 
Oracle 9 also implements the SQL92 syntax supported by MySQL, 
as does PostgreSQL 7.1 and later. 
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Right (Outer) Join 

A right outer join (or more, commonly, right or right-hand join) is similar to a left 
join, except that all rows from the second (or right-hand) table in the join that 
satisfy any included WHERE clause are returned, whether or not matching rows are 
found in the first (left-hand) table. MySQL supports ANSI-style right joins using 
either the RIGHT JOIN or RIGHT OUTER JOIN keywords. 

As you can see in Figure 5-5, this works as you would expect: in the opposite 
fashion from a left join. 
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Figure 5-5. A right (outer) join returns all records from the right-hand table that 
meet any conditions in the WHERE clause, whether or not there’s a match in the 
left-hand table. 


































































The following shows the query represented in Figure 5-5 being run in the 
MySQL Monitor. 



You can see that there are three instructors whose course_id column value 
is 3 (that is, they all teach English classes). Two instructors teach three classes 
each, and the third (Ronald Pinkerton) isn’t listed as teaching any classes at all. 

In this case, NULL values are returned in both columns in the classes table for 
rows that don’t match any instructor_id values from the instructors table. If this 
seems a bit confusing, try turning it into a left join: 

SELECT c.name, c.hours, i.firstname, i.lastname 
FROM instructors i 
LEFT JOIN classes c 
USING(instructor_id) 

WHERE i.course_id = 3; 

If you run this query in the MySQL Monitor, you’ll find that the results are 
exactly the same as those produced by the previous right join. 


TIP Left or right? In most cases, it really doesn’t matter whether you 
use left joins or right joins, as long as the tables to be joined are in 
the correct order. However, the recommended practice by most profes¬ 
sionals is to use left joins whenever possible. Using either one or the 
other (but not both) is desirable for reasons of consistency. In addi¬ 
tion, most people seem to find left joins easier to visualize than right 
joins when reading and writing queries. 


Full Join 

A full join returns all rows from both tables being joined that otherwise fulfill 
any conditions set in a query’s WHERE clause. All columns in rows from either 
table that don’t have matches in the other one are filled with NULL values. 
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MySQL doesn’t support explicit full joins; instead we’ll offer a couple of alter¬ 
native ways to simulate a full join later in this chapter, in the “Emulating a Full 
Join Using a UNION Query” and “Emulating a Full Join Using a Temporary Table” 
sections. 


CAUTION Some references state that the default join type in MySQL 
is the full join, but this is incorrect usage of the term full join, where 
cross join is what’s really meant. 


Natural Join 

A natural join is a MySQL-specific shortcut that performs the same task as an 
inner or left join in which the ON or USING clause refers to all columns that the 
tables to be joined have in common. Using this form: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
NATURAL JOIN classes c 
WHERE i.lastname LIKE 'AT; 

is the same as using this form: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
INNER JOIN classes c 
USING(instructor_id) 

WHERE i.lastname LIKE 'A%’; 

Similarly, you can make MySQL assume automatically that all same-named 
columns are to be used as join keys for a left outer join: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
NATURAL LEFT JOIN classes c 
WHERE i.lastname LIKE 'AT; 

This yields the same result as the following: 

SELECT i.firstname, i.lastname, c.name 
FROM instructors i 
LEFT JOIN classes c 
USING(instructor_id) 

WHERE i.lastname LIKE 'AT; 








CAUTION You can't use the INNER keyword with NATURAL I0IN. You’ll 
get a syntax error if you try to do this. 


You can also perform natural right joins, as in the following example. 



Since there are no instructors in the Botany department, the columns from 
the left-hand table in the row containing "Botany" from the right-hand table are 
filled with NULL values. 

By using the same name for related columns in different tables and 
NATURAL [LEFT | RIGHT] I0IN syntax, you can save a lot of typing in your joins. The 
principal drawback to the USING notation is that this isn’t portable from MySQL to 
other databases. It’s also true that someone who is not familiar with your table 
schemas may need to look them up before being to able to know for certain on 
which columns the tables in the query are being joined. 



TIP You can also use NATURAL LEFT OUTER JOIN and 
NATURAL RIGHT OUTER I0IN, in addition to what’s shown 
in the examples. 


Self Join 

Self joins aren’t used often, but they are very handy for one particular purpose: 
retrieving information that represents a hierarchy. Suppose we want to model 
the supervisory responsibilities for personnel in an office department, such as 
that represented by the tree graph in Figure 5-6. 
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Figure 5-6. A hierarchy of department personnel 

As shown here, Nora Doe (Employee #1) supervises James Wu (Employee #2) 
and Nancy Beck (Employee #3), James Wu supervises A1 Smith (Employee #4) 
and Mary Lester (Employee #5), and so on. It’s fairly straightforward to construct 
a table to hold this data: 

CREATE TABLE employees ( 

employee_id int(ll) NOT NULL autoincrement, 
supervisor^ int(ll) default NULL, 
firstname varchar(50) NOT NULL default ", 
lastname varchar(50) NOT NULL default ", 

PRIMARY KEY (employee_id) 

); 


This table contains an employee_id column to store the employee’s ID 
number, a supervisor_id column to hold the employee ID of the supervisor to 
whom the employee reports, and firstname and lastname columns to store the 
employee’s first and last names. We’ll allow the supervisor_id column of this 
table to take a NULL value to indicate an employee with no supervisors. The first 
two statements for inserting the data into this table look like this: 























INSERT INTO employees (employeeid, supervisor_id, firstname, lastname) 
VALUES (1, NULL, 'Nora', ’Doe'); 

INSERT INTO employees (employeeid, supervisor_id, firstname, lastname) 
VALUES (2, 1, 'Dames', 'Wu'); 


NOTE Writing the remaining INSERT statements based on the dia¬ 
gram in Figure 5-6 should be a trivial exercise. Alternatively, you can 
use the self-join.sql script included in the ch5 directory of the code 
download for this book (available from the Downloads section of 
http: //www. apress. com) to create and populate the employees table. 



Data that refers to other data stored in the same table exhibits what’s com¬ 
monly referred to as a recursive, or reflexive, relationship. A self join is used to 
extract this data in such a way that this recursive relationship is preserved, and it 
works by joining the table to itself. Since we can refer to the same table identifier 
(table name or alias) only once in a given query, we indulge in a couple of bits of 
alias trickery here: 

• We use the same table name twice, but use a different table alias each 
time. In effect, we’re telling MySQL to refer to the same table under two 
different guises. 

• Since the columns which we’re retrieving have the same names, we use 
column aliases to distinguish the columns returned from the table under 
the first table alias from those returned from the table under the table 
second alias. 

We’ll take the additional step of concatenating the first and last names of 
each employee as well. The resulting query might look like this: 
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If we want to include a record showing that Nora Doe is the department 
head (that she has no immediate supervisor), we can do that using an outer join. 
Since we’re displaying the supervisor column on the left side of the output, we’ll 
need to use a right join: 



Notice that we use IFNULLQ once again in order to substitute a descriptive 
message in place of the word NULL in Nora’s employee record, making use of the 
fact that the result of concatenating any value to NULL is also NULL. If we wanted 
to use a left join, we could rewrite this query as follows: 

SELECT 

IFNULL(CONCAT(e2.firstname, ' e2.1astname), "[DEPARTMENT HEAD]") 

AS supervisor, 

C0NCAT(el.firstname, 1 ', el.lastname) AS employee 
FROM employees el 

LEFT JOIN employees e2 ON e2.employee_id = el.supervisorid; 

We just switch the aliases used for the columns to be selected and in the ON 
clause. 

Other likely scenarios for using self joins include relating parts of items that 
themselves are used as parts of other items; department hierarchies in an orga¬ 
nization; and sections and subsections of a document, an application, or a 
web site. 


Unions 

Beginning with MySQL 4.0, you can use the UNION keyword to combine the 
results of multiple SELECT queries into a single resultset. This can be very useful 











in eliminating the need to store resultsets in programming data structures such 
as arrays or to employ temporary tables (which we’ll look at very shortly) in 
order to preserve intermediate results. 

The basic syntax for UNION is as follows: 


(SELECT ...) 


UNION [DISTINCT 

1 all] 

(SELECT ...) 


[UNION 


(SELECT ...) [.. 

•]] 


The SELECT statements can be any that are legal in MySQL, as long as each 
query yields the same number of columns. The parentheses surrounding the 
individual SELECT statements are required if you want to use an ORDER BY clause 
that affects the combined resultset. However, it’s good practice to use parenthe¬ 
ses in any case, to make your queries easier to read. 


CAUTION Prior to MySQL 4.1.1, there’s a further restriction in that 
the values of the columns in the first SELECT query’s resultset are used 
to determine the result types and lengths of the same columns for the 
combined resultset. This means that column values from the second 
query and any additional ones might be truncated or otherwise 
altered in order to match the sizes and types of those resulting from 
the first SELECT. 


Let’s look at a simple example. Suppose we have two tables listing a small 
firm’s salespeople and service technicians. Here’s the definition for the sales 
table: 

CREATE TABLE sales ( 

firstname varchar(50) NULL, 
lastname varchar(50) NULL 

); 


NOTE The table definitions and data for this example are included 
in the union.sql file in the ch5 directory of this book’s downloadable 
code. 


The structure of the service_techs table is identical to this one. In order to 
obtain a combined listing of all the employees from both tables in a single result, 
we can do this: 
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Notice that the ORDER BY clause following the second SELECT (and outside the 
parentheses surrounding it) controls the sort order for the entire resultset. 

By default, MySQL eliminates any duplicated rows from the combined 
resultset. Beginning with MySQL 4.0.17, you can indicate this behavior using the 
DISTINCT keyword. While it actually has no effect (since it represents the default 
behavior), it can serve as a reminder that you’re dropping any duplicates. The 
DISTINCT keyword is also required by the SQL standard, so using it will make your 
code more portable as well. 

If we want all rows to be included in the final result, we can use the ALL key¬ 
word, like so: 



Emulating a Full Join Using a UNION Query 

In some cases, you can UNION together a left join and a right join to simulate a full 
join, as shown in the following example. 
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In order for this to work, all the columns in the first query’s resultset must 
accept NULL values. (This is true through MySQL 5.0.0.) 

As an exercise, try writing a query (using the union.sql tables and data as a 
basis) whose output looks something like this: 


EMPLOYEE SALES TECH 

lane Anderson X X 

Franklin Williams X 

lerry Thomas X X 

[etc... ] 

Steve Norton X 

Mike Nelson X 


v CAUTION Any column names used in an ORDER BY clause applying 
to an entire union must be common to the resultsets produced by all 
of the SELECT queries making up the union. 


We’ll look at another method for simulating full joins using temporary tables 
a little later in this chapter. 














Temporary Tables 
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Now let’s talk about another advanced MySQL feature: temporary tables. These 
allow you to create a short-term storage place within the database itself for a set 
of data that you need to use several times in a single series of operations. One 
advantage of this is that you can use SQL to access the data, rather than using 
programming code, which means that if you need to port your application from, 
say, PHP to Java, there’s that much less code to be translated. There are addi¬ 
tional benefits to using temporary tables, as you’ll see shordy. 

Most often, it’s best to obtain a desired set of data in a single SELECT query. 
However, sometimes this simply isn’t possible, or you may want to work with 
subsets of the same, larger resultset over several successive operations. You can 
handle intermediate or temporary results for reuse within a single session in two 
basic ways: 

• Using programming constructs such as arrays, hashes, or objects and 
retrieving data from these when required by the application logic 

• Using database tables 

The second option is preferable because it tends to be faster, there’s less like¬ 
lihood of bugs in the programming code, and applications are more easily 
ported (the latter two reasons derive from the simple fact that there’s less code to 
manage). The one drawback to doing this is that you’re then required to manage 
the tables for storing the intermediate results. The solution to this problem is to 
use temporary tables, which are supported in MySQL 3.23 and later. 


NOTE Beginning with MySQL 5.0.1, views may offer another alter¬ 
native for handling intermediate or temporary results for reuse. See 
Chapter 8 for more information about views. 



Creating Temporary Tables 

To create a temporary table, simply include the TEMPORARY keyword in a table cre¬ 
ation statement. Otherwise, the statements used to create them are no different 
than those used to create normal tables. 

A temporary table can be of type MylSAM, HEAP, MERGE, or InnoDB. (Your 
MySQL installation must support InnoDB tables in order to use these as tempo¬ 
rary tables, of course.) You can also use ISAM as the table type for temporary 
tables in MySQL versions through the 4.0.x series. (Don’t forget that ISAM tables 
are disabled in MySQL 4.1 and removed altogether beginning with MySQL 5.0.0.) 








Temporary tables differ from normal tables in that temporary tables exist only 
for the duration of the current session and are automatically deleted after it ends. 
(For web applications, this means that temporary tables generally cease to exist 
upon completion of the current page or script.) The following is a simple example 
(this particular example was produced on a PC running Windows 2000 Server, but 
the results will be the same regardless of operating system or platform). 



Notice that the table was deleted as soon as the session was ended using the 
\q (or quit) command. You can verify this by logging in again as the same user 
and trying to query the table you created previously. One interesting and poten¬ 
tially useful side effect of this behavior is that multiple users can employ the 
same table names for temporary tables without fear of collisions between them. 
This also means that different users can’t access each other’s temporary tables. If 
one user or process needs to access a table created by a different one, you’ll need 
to omit the TEMPORARY keyword from the table definition and take care of mainte¬ 
nance issues (such as deleting the table once there’s no further use for it) in your 
application code. 


CAUTION In MySQL, you can’t refer to the same temporary table 
more than once in a single query. This means you can’t do a self join 
on a temporary table. 
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Emulating a Full Join Using a Temporary Table 

We’ve already looked at one way of duplicating what a full join does by using a 
UNION query. Now we’ll demonstrate how to simulate a full join with a temporary 
table, in the event you’re working with a MySQL version earlier than 4.0. This will 
be fairly straightforward, but should serve as a good example of the use of a tem¬ 
porary table. 

We’ll use the same sales and service_techs tables defined and populated in 
the union.sql script, which we used in our earlier example of emulating a full 
join. We perform the emulation in four steps: 

1. Create the temporary table used to store the intermediate result. 

2. Insert the data returned by a left join on the two employees tables. 

3. Insert the data from a right join on the same two tables. 

4. Select the final data set from the temporary table. 

We’ll actually combine steps 1 and 2 using CREATE TABLE ... SELECT syntax, 
as this is just as valid for temporary tables as it is for normal ones. The left join is 
simply the one we used earlier in the first part of the UNION join. 



Here is the schema of the temporary table we just created: 
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And here’s the data that we inserted: 



Now we need to insert the data from a right join on the same two tables. We 
already have the necessary SELECT query for this. All we need to do is turn it into 
an INSERT SELECT, as follows. 



The temporary table now contains the following data: 



We have some duplicate rows. We can take care of this problem easily 
enough by using a SELECT DISTINCT instead: 
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This is the same data that we obtained using the UNION join earlier. 


NOTE You can also use temporary tables to simulate subselects in 
older versions of MySQL that don’t support these. We’ll look at how 
this can be accomplished when we discuss subqueries in Chapter 8. 



Transactions 

Both joins and temporary tables can help reduce the amount of application code 
needed to get a given amount of work out of MySQL. A third way to streamline 
your applications that use MySQL is to employ transactions. 

A transaction is simply a group of related operations that make up a logical 
or conceptual whole. For example, when you make a purchase using a credit 
card from a web site, your cardholder account must be debited and the site 
owner’s merchant account must be credited. When problems such as a power 
outage, a system failure, or loss of a network connection occur, it is possible that 
all of these necessary operations may not take place, causing data integrity prob¬ 
lems. (For example, your card is debited, but the merchant’s account doesn’t get 
the credit for the sale.) Transactional features have been added to most popular 
databases in order to provide a solution to exactly this type of problem. Since 
version 3.23-max and all 4.0 versions and later, MySQL has provided transac¬ 
tional support using the InnoDB and BDB storage engines. 

In MySQL (when using InnoDB or BDB tables) or in any other transactional 
database, a transaction must follow what are known as the ACID rules: 
















• Atomicity: All operations associated with any given transaction must 
occur as a single unit. If any single operation fails, then the transaction 
does not take place, and the database is returned to its previous state. This 
is often stated like this: In the event of the failure of an operation that 
makes up part of the transaction, the transaction is not committed, and 
the transaction is rolled back. 

• Consistency: The system’s state following the transaction must be consis¬ 
tent with its original state. For example, if you’re transferring money 
between two checking accounts, the total of the two accounts must be 
the same before and after the transfer takes place. If the balance of one 
account has increased by $100, then the balance of the other account 
must have decreased by the same amount. 

• Isolation: Each transaction must appear to be independent of all other 
actions taking place in the system. In other words (to use our banking 
example again), the system must behave as though the two accounts 
between which funds are being transferred have exclusive use of the sys¬ 
tem while this transaction is taking place. (In practice, true concurrent 
isolation is fairly difficult to achieve without causing major performance 
problems, and what usually happens instead is some sort of sequential 
prioritization.) 

• Durability: Simply put, once a transaction is completed, it must stay com¬ 
pleted. This is achieved in MySQL and other transaction-safe databases by 
means of a transaction log file, which is updated whenever a transaction is 
completed. 

The main benefit of using transactional tables with your application is that 
you can ensure data integrity and concurrency in the event that something 
unexpected may occur. This is very important in the business world, such as for 
financial institutions. 


NOTE Both InnoDB and BDB tables support transactions, but there 
are some problems with BDB tables. They tend to be slow when large 
numbers of them are simultaneously open; they can’t be moved 
between directories; and you may not be able to delete BDB tables 
unless you’re running in auto-commit mode. Also, BDB tables cur¬ 
rently aren’t supported on Linux running on other than 32-bit Intel 
processors, and they aren’t supported at all on Mac OS X. For these 
and other reasons, we recommend that you use InnoDB tables when 
you need to support transactions in a production environment. 
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MySOL Transaction Commands 

MySQL has three commands for use in performing transactions: 

BEGIN: This command begins a transaction. MySQL supports BEGIN WORK 
as an alias for this command. Since MySQL 4.0.11, you can also use the 
standard SQL command START TRANSACTION. Alternatively, you may use 
SET AUTOCOMMIT = 0;. With all of these commands, changes are not writ¬ 
ten to disk and logged until a COMMIT statement has been issued. The 
difference between SET AUTOCOMMIT = 0; and the others is that it disables 
auto-commit mode until you explicidy turn it back on (by issuing a 
SET AUTOCOMMIT = l;); the others merely disable auto-commit mode 
temporarily, until either a COMMIT or ROLLBACK command has been issued. 

ROLLBACK: If there is a failure in any of the queries required for the transac¬ 
tion, you can issue this command to cancel the transaction and return the 
database to its previous state. Issuing a ROLLBACK requires the “undoing” of 
each query that was successful prior to the advent of the error condition 
that necessitated it. Fortunately, thanks to improved versioning tech¬ 
niques, this doesn’t take as long as you might think it would. 

COMMIT: Once all required queries are completed successfully, this com¬ 
mits the transaction: that is, all changes are saved to the transaction log 
on disk or in other permanent storage. COMMIT statements execute fairly 
rapidly, since there is no requirement for any actual additional work in 
the database to be done, only that operations be recorded. 

In MySQL you cannot nest transactions. If you issue a BEGIN statement for a 
given user while that user has a pending transaction, MySQL will treat it as a 
COMMIT followed by a BEGIN. In other words, MySQL allows individual users to per¬ 
form only sequential transactions. 


NOTE MySQL does not support save points as do some other data¬ 
bases. You can only commit or roll back a complete transaction. 



Transaction Processing Considerations 

It’s best to keep transactions as small as possible. Since MySQL must guarantee 
that transactions belonging to different users are kept separate, this means that 
table rows involved in those transactions must be kept locked (and thus not 
accessible by other others) for the duration of each transaction. In addition, 
since transactions must be logged as sets of queries, using the minimum 








number of SQL statements possible per transaction ensures that your applica¬ 
tion isn’t slowed down by the need to log a large number of queries before the 
next transaction can commence. 

You should perform all transactions before and after obtaining user input. 
Don’t write your application in such a way that a transaction is in progress while 
awaiting user response. Imagine what could happen if a user decided to leave for 
his lunch break while the database was waiting for one or more of his transac¬ 
tions to finish! Plan your application in such a way that all necessary data is 
obtained, then the transaction performed, before collecting the next item or set 
of data. 


Summary 

In this chapter, we’ve looked at three features that can help increase the efficiency 
of your MySQL applications: joins (including UNION joins), temporary tables, and 
transactions. Each of these can be used in different ways to cut down on the 
number of queries required to isolate the data you need, reduce or eliminate the 
need to store data as part of application logic, decrease the number of bugs in 
programming code, and help guarantee data integrity. 

Using joins reduces the number of queries needed to obtain a given set of 
data by allowing you to combine queries. These also tend to make applications 
more efficient because they can often eliminate the need to store data in pro¬ 
gramming constructs so that they can be used in subsequent queries. They make 
applications less error-prone because less code means less of a chance for bugs 
to creep into the codebase. We looked at all of the major join types supported by 
MySQL. 

Temporary tables are useful when you want to store the results of queries in 
tables for reuse several times during a single user or application session, rather 
than in programming constructs such as arrays or hashes. MySQL’s temporary 
tables are unique to each session, and they are cleaned up for you at the end of 
each session, which cuts down drastically on table management requirements 
and worries about issues such as table name collisions. They’re also useful in 
cases where it’s unwieldy or even impossible to derive a desired set of data in a 
single query, such as in our example using a temporary table to simulate a full 
join, which isn’t currently supported in MySQL. 

Transactions, supported in MySQL via the InnoDB and BDB storage engines, 
are useful because they provide a mechanism whereby queries can be grouped 
together logically and performed as a unit. This is extremely important in scenar¬ 
ios where the state of the database must be preserved; for example, in the case of 
transferring funds between checking accounts, where the total of the amounts in 
both accounts must be the same before and after the transfer. Without transac¬ 
tions, you must lock all affected tables and track the success or failure of each 
single query required to complete the transfer, and in the event of failure, you 
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must be prepared to perform the inverse of each query that succeeded up to that 
point. (There are also concurrency and other issues involved, but we won’t dwell 
on those here.) Because transactions handle commits and rollbacks for you auto¬ 
matically, you’re saved a tremendous amount of code overhead. Many database 
APIs that are compatible with MySQL provide enhanced support for transactions, 
including Python’s MySQLdb module and ext/mysqli in PHP 5, as you'll see in 
Chapter 7, when we discuss MySQL application programming. 

What's Next 

In the next chapter, we’ll look at additional ways to speed up and improve the 
efficiency of MySQL-backed applications by identifying botdenecks in them as 
they’re running. Some of these methods include analysis of log files and status 
variables, evaluation of table design and queries, caching issues, application 
bloat, and the configuration of the MySQL server. 
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